{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-9dea5c81cd34f5a8",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Exercise 04): Monte-Carlo Methods\n",
    "\n",
    "In this exercise we make use of the racetrack environment (racetrack_environment.py) to test Monte-Carlo methods.\n",
    "\n",
    "The racetrack environment is based on the OpenAI Gym interface (https://gymnasium.farama.org/) depicted in the picture below.\n",
    "\n",
    "![](RL_GYM_racetrack.png)\n",
    "\n",
    "(Source: Wiki, https://www.vecteezy.com/free-vector/car)\n",
    "\n",
    "The agent can send an action to the system - our racetrack env - using the `env.step(action)` function to drive the car in the environment which is given by the following racetrack: \n",
    "\n",
    "![](Racetrack1.png)\n",
    "\n",
    "Here, the red line represents the start line and the goal is to move the car within the yellow course to the white finish line without hitting the wall. \n",
    "If the car hits the wall, it must be returned to the starting line. \n",
    "The information we get from the step function of the environment are\n",
    "- state consisting of the y- and x-postion (`p_y` and `p_x`) and the velocity in x- and y-direction (`v_y` and `v_x`),\n",
    "- `reward`, which will be -1 per step,\n",
    "- `terminated`-flag which indicates if the environment is terminated (in our case if the car has reached the finish line),\n",
    "- `truncated`-flag which is a termination condition outside of the MDP scope, e.g. timelimit, (in our case hitting a wall before the car has reached the finish line),\n",
    "- info (addioninal information, not used here).\n",
    "\n",
    "Our possible actions are to accelerate (positive acceleration) or break (negative acceleration) the car in x- and/or y-direction or do nothing.\n",
    "\n",
    "Accelerating the car will result in changing the velocity of the car as follows:\n",
    "\n",
    "![](Beschleunigen.png)\n",
    "\n",
    "Breaking the car will result in changing the velocity of the car as follows:\n",
    "\n",
    "![](break.png)\n",
    "\n",
    "Our possible action-space is therefore `[-1, 0, 1]` which are availabe as tuple or integer number and encoded as explained later on.\n",
    "\n",
    "Actions are encoded from a single integer (`a`) to the tuple (`a_y`, `a_x`) using the following equations:\n",
    "\n",
    "- `a_y = a//3-1` (Floor division)\n",
    "- `a_x = a%3-1` (Modulus)\n",
    "\n",
    "This is shown in the following diagram:\n",
    "\n",
    "![](Direction_endcoding.png)\n",
    "\n",
    "Please make yourself more familiar with the used environment (racetrack_environment.py) for more information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the start, please execute the following cells.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-9c8cfa434031df78",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import random\n",
    "from racetrack_environment import RaceTrackEnv\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.animation as animation\n",
    "from tqdm.notebook import tqdm\n",
    "plt.style.use('dark_background')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-46112ad628791ed0",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Execute the following cells to build a race track using the `RaceTrackEnv` as a test scenario.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-ab28c0c5fbe2404e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from utils import build_uturn_course"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WWWWWWWWWWWW\n",
      "WWWWW-oooooW\n",
      "WWWWW-oooooW\n",
      "WWWWW-oooooW\n",
      "WWWWWWWWWooW\n",
      "WWWWWWWWWooW\n",
      "WWWWW+oooooW\n",
      "WWWWW+oooooW\n",
      "WWWWW+oooooW\n",
      "WWWWWWWWWWWW\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeIAAAGdCAYAAADOnXC3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAUkklEQVR4nO3da4xV5bnA8WeQSwPu0Q8qlLH1CtqGiuXirahtJ6Tai2iaQMFEjLGt0poQ2xQn2EQxKTRNBCXT1tiI1UhS00RbS00rxF5EkaJRI2pTKuNl0FEKcUYHGQrv+eA50zMVZTYy82w2v1/yfpjFWqzHBey/a+89exoiogQAkGJI9gAAcCgTYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEg3NOOnYsWOjq6sr49QAMCgqlUps2bJln/sNeojHjh0b7e3tg31aABh0TU1N+4zxoIf4/+6ET25qirfdFQNQhw6vVGJTe3u/nv1NeWo6IuLtri5PTwNwyPNmLQBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASDRfoV43rx5sXnz5tixY0esW7cupk6deqDnAoBDQtUhnjlzZtx8881x4403xqRJk+Lpp5+OP/zhD3H00UcPxHwAUNeqDvG1114bt99+e9x5553x/PPPx1VXXRXd3d1xxRVXDMR8AFDXqgrxsGHDYvLkybF69erebaWUWL16dZx99tl7PWb48OFRqVT6LADgPVWF+KijjoqhQ4dGR0dHn+0dHR0xZsyYvR7T0tISnZ2dvau9vX3/pwWAOjPg75pevHhxNDY29q6mpqaBPiUAHDSGVrPz1q1b49///neMHj26z/bRo0fH66+/vtdjenp6oqenZ/8nBIA6VtUd8a5du+KJJ56I5ubm3m0NDQ3R3Nwcjz322AEfDgDqXVV3xBERN998c/zyl7+MDRs2xPr162P+/PkxatSoWLFixUDMBwB1reoQ33vvvXH00UfHokWLYsyYMfHUU0/FBRdcEG+88cZAzAcAda0hIspgnrBSqURnZ2eMaWyMrq6uwTw1AAyKSqUSr3d2RmM/WuezpgEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEVf/QB2DfNmcPsBfHPJ49AQysUWdmT7B/3BEDQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiaoK8XXXXRfr16+Pzs7O6OjoiPvuuy/Gjx8/ULMBQN2rKsTnn39+tLa2xllnnRXTp0+PYcOGxR//+McYOXLkQM0HAHVtaDU7X3jhhX2+vvzyy+PNN9+MyZMnx1//+tcDOhgAHAqqCvF/O+KIIyIiYtu2bR+4z/Dhw2PEiBG9X1cqlY9ySgCoK/v9Zq2GhoZYtmxZPPLII7Fx48YP3K+lpSU6Ozt7V3t7+/6eEgDqzn6HuLW1NSZMmBDf+MY3PnS/xYsXR2NjY+9qamra31MCQN3Zr6emly9fHl/96lfjvPPO2+cdbk9PT/T09OzXcABQ76oO8fLly+OSSy6Jz3/+89HW1jYAIwHAoaOqELe2tsacOXNixowZ0dXVFaNHj46IiLfeeivefffdARkQAOpZVa8Rz5s3L4488sj485//HK+//nrvmjVr1kDNBwB1rao74oaGhoGaAwAOST5rGgASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImGZg8A9eiE7AH25szsAdhf7zyePQEDyR0xACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgEQfKcQLFiyIUkosXbr0QM0DAIeU/Q7xlClT4tvf/nY8/fTTB3IeADik7FeIR40aFffcc09885vfjO3btx/omQDgkLFfIW5tbY1Vq1bFmjVr9rnv8OHDo1Kp9FkAwHuGVnvArFmzYtKkSTF16tR+7d/S0hI33HBDtacBgENCVXfExx57bNxyyy1x6aWXxs6dO/t1zOLFi6OxsbF3NTU17degAFCPqrojnjx5cowePTqefPLJ//wGQ4fGeeedF9/97ndjxIgRsWfPnj7H9PT0RE9Pz4GZFgDqTFUhXrNmTUyYMKHPthUrVsQLL7wQP/7xj98XYQDgw1UV4rfffjs2btzYZ9s777wT//rXv963HQDYN5+sBQCJqn7X9H/7whe+cCDmAIBDkjtiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBqaPQDUo3dKyR7h/dY3ZE8A7IU7YgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJqg7x2LFj4+67746tW7dGd3d3PPPMMzF58uSBmA0A6l5VP4/4yCOPjLVr18bDDz8cF154Ybz55psxbty42L59+0DNBwB1raoQL1iwIF555ZW44oorere1tbUd6JkA4JBR1VPTF110UWzYsCHuvffe6OjoiCeffDKuvPLKDz1m+PDhUalU+iwA4D1VhfjEE0+Mq6++Ov7xj3/El770pfjZz34Wt956a1x22WUfeExLS0t0dnb2rvb29o88NADUi4aIKP3deefOnbFhw4b43Oc+17vtlltuialTp8Y555yz12OGDx8eI0aM6P26UqlEe3t7jGlsjK6urv2fHGrYO6Xf/6wGz/qG7AlgQI06M3uC/6hUKvF6Z2c09qN1Vd0Rv/baa/Hcc8/12fb888/HJz/5yQ88pqenJ7q6uvosAOA9VYV47dq1ccopp/TZNn78+HjppZcO6FAAcKioKsRLly6Ns846K1paWuKkk06K2bNnx7e+9a1obW0dqPkAoK5VFeINGzbEJZdcErNnz45nn302fvjDH8b8+fNj5cqVAzUfANS1qr6POCJi1apVsWrVqoGYBQAOOT5rGgASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaARFWFeMiQIbFo0aJ48cUXo7u7OzZt2hTXX3/9QM0GAHVvaDU7L1iwIK6++uqYO3dubNy4MaZMmRIrVqyIt956K5YvXz5QMwJA3aoqxOecc0785je/id///vcREfHSSy/F7Nmz44wzzhiQ4QCg3lX11PSjjz4azc3NMW7cuIiIOO2002LatGnx4IMPfuAxw4cPj0ql0mcBAO+p6o54yZIl0djYGC+88ELs3r07DjvssFi4cGGsXLnyA49paWmJG2644aPOCQB1qao74pkzZ8all14ac+bMiUmTJsXcuXPj+9//flx22WUfeMzixYujsbGxdzU1NX3koQGgXlR1R/yTn/wklixZEr/61a8iIuLZZ5+N4447LlpaWuKuu+7a6zE9PT3R09Pz0ScFgDpU1R3xyJEjY8+ePX227d69O4YM8e3IALA/qrojfuCBB2LhwoXx8ssvx8aNG+Ozn/1sXHvttXHHHXcM1HwAUNeqCvE111wTN910U/z0pz+NY445JrZs2RK33XZbLFq0aKDmA4C61hARZTBPWKlUorOzM8Y0NkZXV9dgnhoGzTtlUP9Z9c/6huwJYECNOjN7gv+oVCrxemdnNPajdV7cBYBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABJV9UMfgP4Z1eBznYH+cUcMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBqadeLDK5WsUwPAgKqmcYMe4sr/DrepvX2wTw0Ag6pSqURXV9eH7tMQEWVwxvmPsWPH7nOwfalUKtHe3h5NTU0f+feqZ65T/7hO/eM69Y/r1D/1fp0qlUps2bJln/ulPDXdn8H6q6urqy7/AA8016l/XKf+cZ36x3Xqn3q9Tv39b/JmLQBIJMQAkOigDfHOnTvjhhtuiJ07d2aPUtNcp/5xnfrHdeof16l/XKf3pLxZCwB4z0F7RwwA9UCIASCREANAIiEGgEQHbYjnzZsXmzdvjh07dsS6deti6tSp2SPVlOuuuy7Wr18fnZ2d0dHREffdd1+MHz8+e6yatmDBgiilxNKlS7NHqTljx46Nu+++O7Zu3Rrd3d3xzDPPxOTJk7PHqilDhgyJRYsWxYsvvhjd3d2xadOmuP7667PHSnfuuefGb3/722hvb49SSsyYMeN9+9x4442xZcuW6O7ujoceeihOPvnkhElzlYNtzZw5s7z77rvl8ssvL5/61KfKbbfdVrZt21aOPvro9NlqZT344INl7ty55dOf/nQ57bTTyu9+97vS1tZWRo4cmT5bLa4pU6aUF198sTz11FNl6dKl6fPU0jryyCPL5s2byx133FGmTp1ajj/++DJ9+vRy4oknps9WS6ulpaW8+eab5ctf/nI57rjjyte//vXS2dlZrrnmmvTZMtcFF1xQbrrppnLxxReXUkqZMWNGn1//wQ9+ULZv314uuuii8pnPfKbcf//95Z///GcZMWJE+uyDuNIHqHqtW7euLF++vPfrhoaG8uqrr5YFCxakz1ar66ijjiqllHLuueemz1Jra9SoUeXvf/97aW5uLg8//LAQ/9davHhx+ctf/pI+R62vBx54oPziF7/os+3Xv/51ufvuu9Nnq5W1txBv2bKlfO973+v9urGxsezYsaPMmjUrfd7BWgfdU9PDhg2LyZMnx+rVq3u3lVJi9erVcfbZZydOVtuOOOKIiIjYtm1b8iS1p7W1NVatWhVr1qzJHqUmXXTRRbFhw4a49957o6OjI5588sm48sors8eqOY8++mg0NzfHuHHjIiLitNNOi2nTpsWDDz6YPFntOuGEE+LjH/94n8fzzs7OePzxxw+px/O0n0e8v4466qgYOnRodHR09Nne0dERp556atJUta2hoSGWLVsWjzzySGzcuDF7nJoya9asmDRpkvcYfIgTTzwxrr766rj55pvjRz/6UUydOjVuvfXW6Onpibvuuit7vJqxZMmSaGxsjBdeeCF2794dhx12WCxcuDBWrlyZPVrNGjNmTETEXh/P/+/XDgUHXYipXmtra0yYMCGmTZuWPUpNOfbYY+OWW26J6dOnH/IfsfdhhgwZEhs2bIiFCxdGRMRTTz0VEyZMiKuuukqI/5+ZM2fGpZdeGnPmzImNGzfG6aefHsuWLYstW7a4TuxT+vPj1axhw4aVXbt2ve91hjvvvLPcf//96fPV2lq+fHl5+eWXy/HHH58+S62tGTNmlFJK2bVrV+8qpZTdu3eXXbt2lSFDhqTPWAurra2t3H777X22XXXVVeXVV19Nn62W1ssvv1zmzZvXZ9vChQvL888/nz5braz/fo34hBNOKKWUMnHixD77/elPfyrLli1Ln3ew1kH3GvGuXbviiSeeiObm5t5tDQ0N0dzcHI899ljiZLVn+fLlcckll8QXv/jFaGtryx6n5qxZsyYmTJgQp59+eu/629/+Fvfcc0+cfvrpsWfPnuwRa8LatWvjlFNO6bNt/Pjx8dJLLyVNVJtGjhz5vr8zu3fvjiFDDrqH2UGzefPmeO211/o8nlcqlTjzzDMPucfz9P8bqHbNnDmz7Nixo1x22WXl1FNPLT//+c/Ltm3byjHHHJM+W62s1tbWsn379nLeeeeV0aNH966Pfexj6bPV8vKu6fevKVOmlJ6entLS0lJOOumkMnv27PL222+XOXPmpM9WS2vFihXllVde6f32pYsvvri88cYbZcmSJemzZa5Ro0aViRMnlokTJ5ZSSpk/f36ZOHFi+cQnPlEi3vv2pW3btpWvfe1rZcKECeW+++7z7UsHy/rOd75T2trayrvvvlvWrVtXzjjjjPSZaml9kLlz56bPVstLiPe+vvKVr5Rnnnmm7Nixozz33HPlyiuvTJ+p1tbhhx9eli5dWtra2kp3d3fZtGlTuemmm8qwYcPSZ8tc559//l4fi1asWNG7z4033lhee+21smPHjvLQQw+VcePGpc89mMuPQQSARF68AIBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAk+h90hWi7O3KZPwAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Build the course\n",
    "_course_dim = (8, 10)\n",
    "_inner_wall_dim = (2, 6)\n",
    "\n",
    "course = build_uturn_course(_course_dim, _inner_wall_dim)\n",
    "track = RaceTrackEnv(course)\n",
    "for row in course:\n",
    "    print(row)\n",
    "\n",
    "pos_map = track.course  # overlay track course\n",
    "plt.imshow(pos_map, cmap='hot', interpolation='nearest')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-ce1387b114d55595",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## 1) Monte-Carlo-Based Prediction (Policy Evaluation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-a3672043edcf93af",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Write a first-visit Monte-Carlo algorithm to evaluate the dummy policy as defined below on the U-turn course. The dummy policy turns the car to the right as soon as it stands in front of a wall. Try to understand how the policy works before you start to code.\n",
    "Think about what the different dimensions of the policy array encode. It might be helpful to print parts of the array for visualization.\n",
    "\n",
    "How can we interprete the state values resulting from the evaluation with first-visit Monte-Carlo?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select course and initialize dummy policy\n",
    "\n",
    "course = build_uturn_course(_course_dim, _inner_wall_dim)\n",
    "track = RaceTrackEnv(course)\n",
    "dummy_slow_pi = np.ones([track.bounds[0], track.bounds[1], 1+2*track.MAX_VELOCITY, 1+2*track.MAX_VELOCITY]) * 4 \n",
    "\n",
    "dummy_slow_pi[:track.bounds[0]//2, :, 0 , 0] = 5  # accelerate right\n",
    "dummy_slow_pi[:track.bounds[0]//2, -2:, 0, :] = 6  # accelerate bottom left\n",
    "dummy_slow_pi[-2:, track.bounds[1]//2:, :, 0] = 0  # accelerate top left\n",
    "\n",
    "pi = dummy_slow_pi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-32d1e89b52d7ea2b",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## 1) Solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The simple and deterministic dummy policy will always guarantee the car to reach the finish line. Thus, the state values can be interpreted as the number of timesteps that is necessary to reach the goal from that specific state (i.e. position and velocity) if we are following the policy.\n",
    "\n",
    "In simple words, the policy acts as follows:\n",
    "- ```dummy_slow_pi[:track.bounds[0]//2, :, 0, 0] = 5```: This part of the policy accelerates the car to the right when it is located anywhere in the upper half of the track bounds (first two dimensions) and has no velocity, i.e. is standing still (last two dimensions). This means that this part of the policy accelerates the car to a maximum velocity of ```v_x = 1```. As the car is not affected by this after it is already moving.\n",
    "- ```dummy_slow_pi[:track.bounds[0]//2, -2:, 0, :] = 6```: This part of the policy takes effect when the car is in the upper half of the right boundary of the track. It only affects the car, when there is no vertical velocity (3rd dimension). As the car reaches this area of the space with the velocity ```v_x = 1``` and ```v_y = 0```, the velocity is ```v_x = 0``` and ```v_y = 1``` after this part of the policy was applied once. \n",
    "- ```dummy_slow_pi[-2:, track.bounds[1]//2:, :, 0] = 0```: This part of the policy takes effect when the car is at the right half of the lower boundary of the track. It only affects the car, when there is no horizontal velocity (4th dimension). As the car reaches this area of the space with the velocity ```v_x = 0``` and ```v_y = -1```, the velocity is ```v_x = -1``` and ```v_y = 0``` after this part of the policy was applied once.\n",
    "\n",
    "Overall, we can see that the absolute value of the car's velocity never goes above 1. While this makes the movement of the car manageable, this is not the fastest way to get through the track."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function templates to help structure the code, these (and similar ones) will be used in future exercises as well\n",
    "# Note that the environment/track is a global variable here, so it does not need to be passed as an argument\n",
    "# This is not a good practice in general, but it is done here to simplify the code\n",
    "# You can therefore use the track variable directly in the functions\n",
    "# e.g., track.reset()\n",
    "\n",
    "def interact(pi, state):\n",
    "    \"\"\"Interact with the environment to get to the next state.\n",
    "\n",
    "    Args:\n",
    "        pi: The policy to follow\n",
    "        state: The current state before interaction\n",
    "\n",
    "    Returns:\n",
    "        next_state: The next state after interaction\n",
    "        reward: The reward for the current interaction\n",
    "        terminated: If the goal was reached\n",
    "        truncated: If the boundary of the track was breached\n",
    "    \"\"\"\n",
    "    \n",
    "    action = track.action_to_tuple(pi[state])\n",
    "    next_state, reward, terminated, truncated, _ = track.step(action)\n",
    "\n",
    "    return next_state, reward, terminated, truncated\n",
    "\n",
    "\n",
    "def gather_experience(pi, max_epsiode_len):\n",
    "    \"\"\"Simulate a full episode of data by repeatedly interacting with the environment.\n",
    "\n",
    "    End the episode when the finish line is reached. Whenever the car leaves the track, simply\n",
    "    reset the environment.\n",
    "    \n",
    "    Args:\n",
    "        pi: The policy to apply\n",
    "        max_episode_len: The number of steps at which the episode is terminated automatically\n",
    "\n",
    "    Returns:\n",
    "        states: All states that were visited in the episode\n",
    "        rewards: All rewards that were acquired in the episode\n",
    "        visited_states: The unique states that were visited in the episode\n",
    "        first_visit_list: Whether it was the first time in the episode that the\n",
    "            state at the same index was visited\n",
    "    \"\"\"\n",
    "    # initialize variables in which collected data will be stored\n",
    "    states = [] # list of tuples\n",
    "    rewards = [] # list of floats\n",
    "    visited_states = set() # set of tuples\n",
    "    first_visit_list = [] # list of booleans\n",
    "\n",
    "    # reset environment and start episode\n",
    "    state = track.reset()\n",
    "    # There will be two different ways to end the episode: reaching the finish line or reaching the maximum number of steps\n",
    "    # If the car hits the boundary of the track, the environment will be reset and the episode will continue\n",
    "    # Therefore, you have to handle termination and truncation differently\n",
    "    for k in range(max_epsiode_len):\n",
    "        \n",
    "        # save the momentary state and check for first_visit\n",
    "        states.append(state)\n",
    "        first_visit_list.append(state not in visited_states)\n",
    "        visited_states.add(state)\n",
    "        \n",
    "        # interact with the environment\n",
    "        next_state, reward, terminated, truncated = interact(pi, state)\n",
    "\n",
    "        # reset the environment if the car left the track\n",
    "        if truncated:\n",
    "            next_state = track.reset()\n",
    "\n",
    "        # update the state for the next iteration\n",
    "        state = next_state\n",
    "\n",
    "        # save received reward\n",
    "        rewards.append(reward)\n",
    "        # terminate the environment if the finish line was passed\n",
    "        if terminated: \n",
    "            break \n",
    "\n",
    "    return states, rewards, visited_states, first_visit_list\n",
    "\n",
    "def learn(values, n_dict, states, rewards, first_visit_list, gamma):\n",
    "    \"\"\"Learn from the collected data using the incremental first-visit MC-based prediction algorithm.\n",
    "\n",
    "    Args:\n",
    "        values: The state-value estimates before the current update step\n",
    "        n_dict: The state visit counts before the current update step\n",
    "        states: All states that were visited in the last episode\n",
    "        rewards: All rewards that were visited in the last episode\n",
    "        first_visit_list: Whether it was the first time in the episode that the\n",
    "            state at the same index was visited\n",
    "        gamma: Discount factor\n",
    "\n",
    "    Returns:\n",
    "        values: The updated state-value estimates\n",
    "        n_dict: The state visit counts after the current update step\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    g = 0  \n",
    "    for state, reward, first_visit in zip(states[::-1], rewards[::-1], first_visit_list[::-1]): # count backwards\n",
    "\n",
    "        # calculate return\n",
    "        g = gamma * g + reward\n",
    "        \n",
    "        # update values if it is the first visit in the episode\n",
    "        if first_visit:\n",
    "            \n",
    "            # Count visits to this state in n_dict\n",
    "            n_dict[state] = n_dict.get(state, 0) + 1\n",
    "\n",
    "            # add new return g to existing value\n",
    "            values[state] += 1/n_dict[state] * (g-values[state])\n",
    "\n",
    "    ### END SOLUTION\n",
    "    return values, n_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-ac5467fab5f148f4",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b0d85029b44946a1b6626585cdb8d6c2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/500 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# initialize the value function\n",
    "values = np.zeros([track.bounds[0], track.bounds[1], 1+2*track.MAX_VELOCITY, 1+2*track.MAX_VELOCITY])\n",
    "\n",
    "# initialize an empty dict to count the number of visits\n",
    "# note that in the lecture the list l(x_k) was used for this purpose\n",
    "n_dict = {}\n",
    "\n",
    "# configuration parameters\n",
    "gamma = 1 # discount factor\n",
    "no_episodes = 500 # number of evaluated episodes\n",
    "max_episode_len = 2000 # number of allowed timesteps per episode\n",
    "\n",
    "for e in tqdm(range(no_episodes), position=0, leave=True):\n",
    "\n",
    "    states, rewards, visited_states, first_visit_list = gather_experience(pi, max_episode_len)\n",
    "\n",
    "    values, n_dict = learn(values, n_dict, states, rewards, first_visit_list, gamma)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-6fe53fdd68a6c909",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "To visualize the result of the evaluation, plot the state values as a function of **position only** (so that you get a two dimensional representation of the state value) and in the form of a tabular represenation and a heatmap. In order to omit dependence of the velocity dimensions, use the minimum of the value function with respect to the velocities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-74fc6bcd5def8261",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "000 000 000 000 000 000 000 000 000 000 000 000\n",
      "000 000 000 000 000 -17 -16 -15 -14 -13 -12 000\n",
      "000 000 000 000 000 -16 -15 -14 -13 -12 -11 000\n",
      "000 000 000 000 000 -15 -14 -13 -12 -11 -10 000\n",
      "000 000 000 000 000 000 000 000 000 000 -09 000\n",
      "000 000 000 000 000 000 000 000 000 000 -08 000\n",
      "000 000 000 000 000 000 000 000 000 000 -07 000\n",
      "000 000 000 000 000 000 000 000 000 000 -06 000\n",
      "000 000 000 000 000 000 -01 -02 -03 -04 -05 000\n",
      "000 000 000 000 000 000 000 000 000 000 000 000\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeIAAAGdCAYAAADOnXC3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAU9klEQVR4nO3da4xV5dn44XuQQwPu0Q8olLH1VNA2VCwHq9ZD2wmp9iCaJlAwEWNsq7S+MbYpTrCJYlJomghKpq2xEavRpKYJttaaVog9iFKLRo0oplTGw6CjFuKMzshQfP4f/L/Td8oos5GZe7O5ruT5MMu1WLdLs3+svffs3RARJQCAFCOyBwCAg5kQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQaGTGSSdNmhRdXV0ZpwaAYVGpVGLbtm173W/YQzxp0qRob28f7tMCwLBramraa4yHPcT/eyf8iaameMtdMQB16NBKJba0tw/q2d+Up6YjIt7q6vL0NAAHPW/WAoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABLtU4gXLVoUW7dujZ6entiwYUPMmjVrf88FAAeFqkM8d+7cuOGGG+K6666L6dOnx5NPPhl/+MMf4ogjjhiK+QCgrlUd4quuuipuueWWuO222+LZZ5+Nyy67LLq7u+OSSy4ZivkAoK5VFeJRo0bFjBkzYu3atX3bSimxdu3aOO200wY8ZvTo0VGpVPotAOA9VYV4/PjxMXLkyOjo6Oi3vaOjIyZOnDjgMS0tLdHZ2dm32tvb931aAKgzQ/6u6WXLlkVjY2PfampqGupTAsABY2Q1O7/xxhvx73//OyZMmNBv+4QJE+LVV18d8Jje3t7o7e3d9wkBoI5VdUe8a9eueOyxx6K5ublvW0NDQzQ3N8cjjzyy34cDgHpX1R1xRMQNN9wQv/zlL2Pjxo3x6KOPxpVXXhnjxo2L1atXD8V8AFDXqg7x3XffHUcccUQsXbo0Jk6cGE888UScc8458dprrw3FfABQ1xoiogznCSuVSnR2dsbExsbo6uoazlMDwLCoVCrxamdnNA6idT5rGgASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEhU9Zc+AHv3dhnWj3AfpIeyBxjA+uwBBmCmQWnfnj3BHsYdlT3BvnFHDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBINDJ7AKhPD2UPMID12QMMwEyD0r49e4I9bc4eoH64IwaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQqKoQX3311fHoo49GZ2dndHR0xJo1a2LKlClDNRsA1L2qQnz22WdHa2trnHrqqTF79uwYNWpU/PGPf4yxY8cO1XwAUNdGVrPzueee2+/niy++OF5//fWYMWNG/PWvf92vgwHAwaCqEP+3ww47LCIitm/f/r77jB49OsaMGdP3c6VS+TCnBIC6ss9v1mpoaIiVK1fGQw89FJs2bXrf/VpaWqKzs7Nvtbe37+spAaDu7HOIW1tbY+rUqfGNb3zjA/dbtmxZNDY29q2mpqZ9PSUA1J19emp61apV8dWvfjXOOuusvd7h9vb2Rm9v7z4NBwD1ruoQr1q1Ki644IL4/Oc/H21tbUMwEgAcPKoKcWtrayxYsCDmzJkTXV1dMWHChIiIePPNN+Odd94ZkgEBoJ5V9RrxokWL4vDDD48///nP8eqrr/atefPmDdV8AFDXqrojbmhoGKo5AOCg5LOmASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgET79H3EwN6szx5gAGYalPbt2RPsaXP2AAN4LnuA+uGOGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQaGT2AFCPxjVcnT0CDKm3/yd7gvrhjhgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAog8V4sWLF0cpJVasWLG/5gGAg8o+h3jmzJnx7W9/O5588sn9OQ8AHFT2KcTjxo2LO++8M775zW/Gjh079vdMAHDQ2KcQt7a2xn333Rfr1q3b676jR4+OSqXSbwEA7xlZ7QHz5s2L6dOnx6xZswa1f0tLS1x77bXVngYADgpV3REfddRRceONN8aFF14YO3fuHNQxy5Yti8bGxr7V1NS0T4MCQD2q6o54xowZMWHChHj88cf/8weMHBlnnXVWfPe7340xY8bEu+++2++Y3t7e6O3t3T/TAkCdqSrE69ati6lTp/bbtnr16ti8eXP8+Mc/3iPCAMAHqyrEb731VmzatKnftrfffjv+9a9/7bEdANg7n6wFAImqftf0f/vCF76wP+YAgIOSO2IASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEg0Yf+rGkADkKbsweoH+6IASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJRmYPAMAB6LnsAeqHO2IASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiaoO8aRJk+KOO+6IN954I7q7u+Opp56KGTNmDMVsAFD3qvo+4sMPPzzWr18fDz74YJx77rnx+uuvx+TJk2PHjh1DNR8A1LWqQrx48eJ46aWX4pJLLunb1tbWtr9nAoCDRlVPTZ933nmxcePGuPvuu6OjoyMef/zxuPTSSz/wmNGjR0elUum3AID3VBXi4447Li6//PL4xz/+EV/60pfiZz/7Wdx0001x0UUXve8xLS0t0dnZ2bfa29s/9NAAUC8aIqIMduedO3fGxo0b43Of+1zfthtvvDFmzZoVp59++oDHjB49OsaMGdP3c6VSifb29pjY2BhdXV37PjkAad4+OnuCPY17IXuC/6hUKvFqZ2c0DqJ1Vd0Rv/LKK/HMM8/02/bss8/Gxz/+8fc9pre3N7q6uvotAOA9VYV4/fr1ccIJJ/TbNmXKlHjhhRr6awgAHECqCvGKFSvi1FNPjZaWljj++ONj/vz58a1vfStaW1uHaj4AqGtVhXjjxo1xwQUXxPz58+Ppp5+OH/7wh3HllVfGXXfdNVTzAUBdq+rNWvtDpVKJzs5Ob9YCOIB5s9YHG7I3awEA+5cQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaARCOzBwDgwNNTQ5/rfKBzRwwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASDQyewBgeJyQPcAATsweYACu0+A8lz1AHXFHDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASBRVSEeMWJELF26NJ5//vno7u6OLVu2xDXXXDNUswFA3avq+4gXL14cl19+eSxcuDA2bdoUM2fOjNWrV8ebb74Zq1atGqoZAaBuVRXi008/PX7zm9/E73//+4iIeOGFF2L+/PlxyimnDMlwAFDvqnpq+uGHH47m5uaYPHlyREScdNJJccYZZ8T999//vseMHj06KpVKvwUAvKeqO+Lly5dHY2NjbN68OXbv3h2HHHJILFmyJO666673PaalpSWuvfbaDzsnANSlqu6I586dGxdeeGEsWLAgpk+fHgsXLozvf//7cdFFF73vMcuWLYvGxsa+1dTU9KGHBoB6UdUd8U9+8pNYvnx5/OpXv4qIiKeffjqOPvroaGlpidtvv33AY3p7e6O3t/fDTwoAdaiqO+KxY8fGu+++22/b7t27Y8QIv44MAPuiqjvie++9N5YsWRIvvvhibNq0KT7zmc/EVVddFbfeeutQzQcAda2qEF9xxRVx/fXXx09/+tM48sgjY9u2bXHzzTfH0qVLh2o+AKhrDRFRhvOElUolOjs7Y2JjY3R1dQ3nqeGgdkL2AAM4MXuAAbhOg1OL1+lz2QP8H5VKJV7t7IzGQbTOi7sAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAoqq+9AE4cD2XPcAAanEmGG7uiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEI7NOfGilknVqABhS1TRu2ENc+f/DbWlvH+5TA8CwqlQq0dXV9YH7NEREGZ5x/mPSpEl7HWxvKpVKtLe3R1NT04f+s+qZ6zQ4rtPguE6D4zoNTr1fp0qlEtu2bdvrfilPTQ9msMHq6uqqy/+A+5vrNDiu0+C4ToPjOg1OvV6nwf47ebMWACQSYgBIdMCGeOfOnXHttdfGzp07s0epaa7T4LhOg+M6DY7rNDiu03tS3qwFALzngL0jBoB6IMQAkEiIASCREANAogM2xIsWLYqtW7dGT09PbNiwIWbNmpU9Uk25+uqr49FHH43Ozs7o6OiINWvWxJQpU7LHqmmLFy+OUkqsWLEie5SaM2nSpLjjjjvijTfeiO7u7njqqadixowZ2WPVlBEjRsTSpUvj+eefj+7u7tiyZUtcc8012WOlO/PMM+O3v/1ttLe3Rykl5syZs8c+1113XWzbti26u7vjgQceiE984hMJk+YqB9qaO3dueeedd8rFF19cPvnJT5abb765bN++vRxxxBHps9XKuv/++8vChQvLpz71qXLSSSeV3/3ud6Wtra2MHTs2fbZaXDNnzizPP/98eeKJJ8qKFSvS56mldfjhh5etW7eWW2+9tcyaNascc8wxZfbs2eW4445Ln62WVktLS3n99dfLl7/85XL00UeXr3/966Wzs7NcccUV6bNlrnPOOadcf/315fzzzy+llDJnzpx+//wHP/hB2bFjRznvvPPKpz/96XLPPfeUf/7zn2XMmDHpsw/jSh+g6rVhw4ayatWqvp8bGhrKyy+/XBYvXpw+W62u8ePHl1JKOfPMM9NnqbU1bty48txzz5Xm5uby4IMPCvF/rWXLlpW//OUv6XPU+rr33nvLL37xi37bfv3rX5c77rgjfbZaWQOFeNu2beV73/te38+NjY2lp6enzJs3L33e4VoH3FPTo0aNihkzZsTatWv7tpVSYu3atXHaaaclTlbbDjvssIiI2L59e/Iktae1tTXuu+++WLduXfYoNem8886LjRs3xt133x0dHR3x+OOPx6WXXpo9Vs15+OGHo7m5OSZPnhwRESeddFKcccYZcf/99ydPVruOPfbY+OhHP9rv8byzszP+9re/HVSP52nfR7yvxo8fHyNHjoyOjo5+2zs6OuLEE09Mmqq2NTQ0xMqVK+Ohhx6KTZs2ZY9TU+bNmxfTp0/3HoMPcNxxx8Xll18eN9xwQ/zoRz+KWbNmxU033RS9vb1x++23Z49XM5YvXx6NjY2xefPm2L17dxxyyCGxZMmSuOuuu7JHq1kTJ06MiBjw8fx//9nB4IALMdVrbW2NqVOnxhlnnJE9Sk056qij4sYbb4zZs2cf9B+x90FGjBgRGzdujCVLlkRExBNPPBFTp06Nyy67TIj/j7lz58aFF14YCxYsiE2bNsXJJ58cK1eujG3btrlO7FX68+PVrFGjRpVdu3bt8TrDbbfdVu655570+WptrVq1qrz44ovlmGOOSZ+l1tacOXNKKaXs2rWrb5VSyu7du8uuXbvKiBEj0meshdXW1lZuueWWftsuu+yy8vLLL6fPVkvrxRdfLIsWLeq3bcmSJeXZZ59Nn61W1n+/RnzssceWUkqZNm1av/3+9Kc/lZUrV6bPO1zrgHuNeNeuXfHYY49Fc3Nz37aGhoZobm6ORx55JHGy2rNq1aq44IIL4otf/GK0tbVlj1Nz1q1bF1OnTo2TTz65b/3973+PO++8M04++eR49913s0esCevXr48TTjih37YpU6bECy+8kDRRbRo7duwe/8/s3r07Row44B5mh83WrVvjlVde6fd4XqlU4rOf/exB93ie/reBatfcuXNLT09Pueiii8qJJ55Yfv7zn5ft27eXI488Mn22Wlmtra1lx44d5ayzzioTJkzoWx/5yEfSZ6vl5V3Te66ZM2eW3t7e0tLSUo4//vgyf/788tZbb5UFCxakz1ZLa/Xq1eWll17q+/Wl888/v7z22mtl+fLl6bNlrnHjxpVp06aVadOmlVJKufLKK8u0adPKxz72sRLx3q8vbd++vXzta18rU6dOLWvWrPHrSwfK+s53vlPa2trKO++8UzZs2FBOOeWU9Jlqab2fhQsXps9Wy0uIB15f+cpXylNPPVV6enrKM888Uy699NL0mWptHXrooWXFihWlra2tdHd3ly1btpTrr7++jBo1Kn22zHX22WcP+Fi0evXqvn2uu+668sorr5Senp7ywAMPlMmTJ6fPPZzL1yACQCIvXgBAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEv0/tYaCzLiG4kEAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def text_print_pos_map(_pos_map):\n",
    "    for row in _pos_map:\n",
    "        print(' '.join(x_size*['{}']).format(*[str(int(r)).zfill(3) for r in row]))\n",
    "        \n",
    "def plot_pos_map(_pos_map):\n",
    "    plt.imshow(_pos_map, cmap='hot', interpolation='nearest')\n",
    "    plt.show()\n",
    "\n",
    "# calculate minimum value with respect to velocities\n",
    "x_size, y_size = len(course[0]), len(course)\n",
    "pos_map = np.zeros((y_size, x_size))\n",
    "\n",
    "for s_x in range(x_size):\n",
    "    for s_y in range(y_size):\n",
    "        pos_map[s_y, s_x] = np.min(values[s_y, s_x, :, :])\n",
    "        \n",
    "text_print_pos_map(pos_map)\n",
    "plot_pos_map(-pos_map)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-54642e38ce9d8a67",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## 2) On-Policy $\\varepsilon$-Greedy Control"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-a81f379107be8dd3",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Starting with the previously used turn-right-if-wall dummy policy, write an on-policy Monte-Carlo based first-visit $\\varepsilon$-greedy control algorithm to solve the U-turn course. The policy is now stochastic: it does not contain simple action commands for each state, but probabilities for each possible action. Again, please make sure to understand how the stochastic policy works before coding.\n",
    "\n",
    "\n",
    "Make sure to implement an upper bound for episode length (we suggest a boundary of 200 steps). Why do we need a bound like this? What happens to the state values / state-action values if we increase the bound?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# dummy policy\n",
    "course = build_uturn_course(_course_dim, _inner_wall_dim)\n",
    "track = RaceTrackEnv(course)\n",
    "\n",
    "dummy_slow_stoch_pi = np.zeros([track.bounds[0], track.bounds[1], 1+2*track.MAX_VELOCITY, 1+2*track.MAX_VELOCITY, 9])\n",
    "\n",
    "dummy_slow_stoch_pi[  :,   :, :, :, 4] = 1 # set probability of doing nothing to one for every state\n",
    "\n",
    "# set probability to accelerate right:\n",
    "dummy_slow_stoch_pi[:track.bounds[0]//2, :, 0 , 0, 5] = 1 \n",
    "# set probability to do nothing where we want to accelerate right:\n",
    "dummy_slow_stoch_pi[:track.bounds[0]//2, :, 0 , 0, 4] = 0 \n",
    "\n",
    "dummy_slow_stoch_pi[:track.bounds[0]//2, -2:, 0 , :, 6] = 1 # probability to accelerate bottom left\n",
    "dummy_slow_stoch_pi[:track.bounds[0]//2, -2:, 0 , :, 4] = 0 \n",
    "\n",
    "dummy_slow_stoch_pi[-2:, track.bounds[1]//2:, : , 0, 0] = 1 # probability to accelerate top left\n",
    "dummy_slow_stoch_pi[-2:, track.bounds[1]//2:, : , 0, 4] = 0\n",
    "\n",
    "pi = dummy_slow_stoch_pi       "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-2143fc4c280b5b6f",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## 2) Solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-89a131cffdbb5d52",
     "locked": false,
     "points": 0,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "source": [
    "Algorithm given below.\n",
    "\n",
    "As we can see, the dummy policy allows for the initial episode to be solved very fast. After that, the dummy policy is forgotten and it takes some time until the agent is able to solve the problem again. \n",
    "\n",
    "The limitation of the episode length forces the agent to learn at least after the allowed number of steps were taken. If one would increase the limit, this would mainly inflate the accumulated return, resulting in larger negative action values for the visited states. As long as we do NOT find the goal, action values will correlate with the time limit. If we find the goal reproducible, the action values will drift towards their true optimal value independently from the time limit.\n",
    "\n",
    "If we do not implement a time limit and allow the episode to terminate only by reaching the goal, the accumulated negative return will explode (we will get very large numbers). As we try to act greedy (take the highest rated and not the lowest rated action), low action values would suggest that the goal is not to be found on the path taken previously."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# function templates, some of these are quite close to the solutions for task 1\n",
    "\n",
    "def policy(pi, state, deterministic, epsilon):\n",
    "    \"\"\"Decides on one of the actions in dependence of the current state.\n",
    "\n",
    "    Args:\n",
    "        pi: The current policy\n",
    "        state: The state vector\n",
    "        deterministic: Whether actions are chosen deterministically or eps-greedily\n",
    "        epsilon: Probability for random action in eps-greedy\n",
    "\n",
    "    Returns:\n",
    "        action: The chosen action\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "\n",
    "    if epsilon < np.random.rand(1) or deterministic:\n",
    "        action = np.argmax(pi[state])\n",
    "    else:\n",
    "        action = random.choice(range(9))\n",
    "    \n",
    "    action = track.action_to_tuple(action)\n",
    "\n",
    "    ### END SOLUTION\n",
    "    return action\n",
    "\n",
    "\n",
    "def interact(pi, state, deterministic, epsilon):\n",
    "    \"\"\"Interact with the environment to get to the next state. Either follow\n",
    "    the given policy or explore randomly with probability epsilon.\n",
    "\n",
    "    Args:\n",
    "        pi: The policy to follow\n",
    "        state: The current state before interaction\n",
    "        deterministic: Whether actions are chosen deterministically or eps-greedily\n",
    "        epsilon: The probability for random interaction\n",
    "\n",
    "    Returns:\n",
    "        next_state: The next state after interaction\n",
    "        reward: The reward for the current interaction\n",
    "        terminated: If the goal was reached\n",
    "        truncated: If the boundary of the track was breached\n",
    "        action: The applied action\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "\n",
    "    action = policy(pi, state, deterministic, epsilon)\n",
    "\n",
    "    next_state, reward, terminated, truncated, _ = track.step(action)\n",
    "\n",
    "    ### END SOLUTION\n",
    "    return next_state, reward, terminated, truncated, action\n",
    "\n",
    "\n",
    "def gather_experience(pi, max_episode_len, deterministic=False, epsilon=0.1):\n",
    "    \"\"\"Simulate a full episode of data by repeatedly interacting with the environment.\n",
    "\n",
    "    End the episode when the finish line is reached. Whenever the car leaves the track, simply\n",
    "    reset the environment.\n",
    "\n",
    "    Args:\n",
    "        pi: The policy to apply\n",
    "        max_episode_len: The number of steps at which the episode is terminated automatically\n",
    "        deterministic: Whether actions are chosen deterministically or eps-greedily\n",
    "        epsilon: Exploration probability\n",
    "\n",
    "    Returns:\n",
    "        state_actions: All states that were visited and all actions that were applied in \n",
    "            the episode, states and actions are simply concatenated. **HINT**: You can use \n",
    "            ```track.state_action(state, action)``` to concatenate state and action.\n",
    "        rewards: All rewards that were acquired in the episode\n",
    "        visited_states: The unique states that were visited in the episode\n",
    "        first_visit_list: Whether it was the first time in the episode that the\n",
    "            state at the same index was visited\n",
    "        pos_map: A map of the track where all state visits are marked\n",
    "    \"\"\"\n",
    "\n",
    "    # initialize variables in which collected data will be stored\n",
    "    state_actions = [] # list of tuples\n",
    "    rewards = [] # list of floats\n",
    "    visited_state_actions = set() # set of tuples\n",
    "    first_visit_list = [] # list of booleans\n",
    "    \n",
    "    pos_map = np.zeros((y_size, x_size)) # initializes a map that can be plotted\n",
    "    \n",
    "    ### BEGIN SOLUTION\n",
    "    \n",
    "    state = track.reset()\n",
    "    for k in range(max_episode_len):\n",
    "\n",
    "        pos_map[state[0], state[1]] += 1  # mark the visited position on the map\n",
    "\n",
    "        next_state, reward, terminated, truncated, action = interact(pi, state, deterministic, epsilon)\n",
    "\n",
    "        # save the momentary state+action and check for first_visit\n",
    "        state_action = track.state_action(state, action)\n",
    "        state_actions.append(state_action)\n",
    "        first_visit_list.append(state_action not in visited_state_actions)\n",
    "        visited_state_actions.add(state_action)\n",
    "\n",
    "        if truncated:\n",
    "            next_state = track.reset()\n",
    "\n",
    "        state = next_state\n",
    "\n",
    "        # save received reward\n",
    "        rewards.append(reward)\n",
    "        \n",
    "        # terminate the environment if the finish line was passed\n",
    "        if terminated:\n",
    "            break\n",
    "\n",
    "    ### END SOLUTION\n",
    "    return state_actions, rewards, visited_states, first_visit_list, pos_map\n",
    "\n",
    "\n",
    "def learn(pi, action_values, n_dict, state_actions, rewards, first_visit_list, gamma, epsilon):\n",
    "    \"\"\"Learn from the collected data with eps-greedy MC-control and update the policy.\n",
    "\n",
    "    Args:\n",
    "        pi: The policy before the update step\n",
    "        action_values: The action-value estimates before the current update step\n",
    "        n_dict: The state action visit counts before the update\n",
    "        state_actions: All state actions that were done in the last epsiode\n",
    "        rewards: All rewards that were visited in the last episode\n",
    "        first_visit_list: Whether it was the first time in the episode that the\n",
    "            state at the same index was visited\n",
    "        gamma: Discount factor\n",
    "        epsilon: Exploration probability\n",
    "\n",
    "    Returns:\n",
    "        pi: The updated policy\n",
    "        action_values: The updated action-value estimates\n",
    "        n_dict: The updated state action visit counts\n",
    "        \n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "\n",
    "    g = 0   \n",
    "    for reward, state_action, first_visit in zip(rewards[::-1], state_actions[::-1], first_visit_list[::-1]): # count backwards\n",
    "        g = gamma * g + reward\n",
    "        \n",
    "        if first_visit:\n",
    "            \n",
    "            # Count visits to this state in n_list\n",
    "            n_dict[state_action] = n_dict.get(state_action, 0) +  1\n",
    "\n",
    "            # add new return g to existing value\n",
    "            action_values[state_action] += 1/n_dict[state_action] * (g - action_values[state_action])\n",
    "                        \n",
    "            # calculate the new action probabilities\n",
    "            u_best = np.argmax(action_values[state_action[:4]])\n",
    "            pi[state_action[:4]] = epsilon / 9\n",
    "            pi[state_action[:4]][u_best] = 1 - epsilon + epsilon / 9\n",
    "    \n",
    "    ### END SOLUTION\n",
    "    return pi, action_values, n_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-9568aa87f2614759",
     "locked": false,
     "points": 0,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2df3b1699740425a9185a4df56844e4b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "episode:   0%|          | 0/5000 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# initialize action_values and counting dict\n",
    "action_values = np.zeros([track.bounds[0], track.bounds[1], 1+2*track.MAX_VELOCITY, 1+2*track.MAX_VELOCITY, 3, 3])\n",
    "n_dict = {}\n",
    "\n",
    "# configuration parameters\n",
    "epsilon = 0.1 # exploration probability\n",
    "gamma = 1 # discount factor\n",
    "no_episodes = 5000 # number of evaluated episodes\n",
    "max_episode_len = 200 # number of evaluated timesteps per episode\n",
    "track_maps_l = []  # placeholder for tracks\n",
    "\n",
    "track = RaceTrackEnv(course)\n",
    "x_size, y_size = len(course[0]), len(course)\n",
    "\n",
    "for e in tqdm(range(no_episodes), desc='episode', mininterval=2):\n",
    "      \n",
    "    state_actions, rewards, visited_states, first_visit_list, pos_map = gather_experience(pi, max_episode_len, epsilon)\n",
    "\n",
    "    pi, action_values, n_dict = learn(pi, action_values, n_dict, state_actions, rewards, first_visit_list, gamma, epsilon)\n",
    "\n",
    "    # optional value map logging\n",
    "    track_maps_l.append(track.course + (pos_map > 0).astype(np.float32))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-ef5799678637f070",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "MovieWriter ffmpeg unavailable; using Pillow instead.\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeIAAAGdCAYAAADOnXC3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAc6ElEQVR4nO3df1RVdb7/8ReE0IgHnVJxxDItTcvSBDTT/BHXW82U5nQHRy3th5VZTmbNIFe7+aNv2vwQjZg0GzFd2VdXM+aUmoVjY2nqqJmF2tUEVFDUNA8Kehj53D+0M4OicUh4Hw/Px1qftWKzN/vNUXm2Dwd2mCQnAABgItx6AAAAajNCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAoQiLkzZt2lRFRUUWpwYAoEZ4PB4VFBR87341HuKmTZsqPz+/pk8LAECNi4uL+94Y13iIv7sSviYuTke5KgYAhKB6Ho925OdX6tlfk6emJeloURFPTwMAaj1erAUAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAoSqFePjw4crJyVFJSYnWrFmjxMTECz0XAAC1QsAhTk5O1pQpUzR+/Hh17NhRn3/+uZYtW6ZGjRpVx3wAAIS0gEM8atQozZw5U7Nnz9bWrVs1bNgwFRcX66GHHqqO+QAACGkBhbhOnTqKj49XVlaWf5tzTllZWerSpUuFx0RGRsrj8ZRbAADglIBC3LBhQ0VERKiwsLDc9sLCQjVp0qTCY1JTU+X1ev0rPz+/6tMCABBiqv1V05MmTVJMTIx/xcXFVfcpAQC4aEQEsvPBgwf1z3/+U7GxseW2x8bGat++fRUe4/P55PP5qj4hAAAhLKAr4tLSUm3YsEFJSUn+bWFhYUpKStKnn356wYcDACDUBXRFLElTpkzRG2+8ofXr12vdunUaOXKkoqOjlZmZWR3zAQAQ0gIO8YIFC9SoUSNNmDBBTZo00aZNm3THHXdo//791TEfAAAhLUySq8kTejweeb1eNYmJUVFRUU2eGgCAGuHxeLTP61VMJVrH75oGAMAQIa4hW3Jy9MRTT1mPAQAIMiEX4hmZmfr/Cxf63166YoV+m5ZWY+e/b8gQ5R8+fNb27omJmvXaazU2R8urr9Y+r/esWSIiIjT6uef0xY4d+qakRGs2bVLv228/6/hHhw/XlpwcfVNSoo/WrFH8GTf2iIqK0pRXXtGugwdVWFSkN99+W40bN67WzwkAQlHIhbi61KlT5wcdf/DgQZWUlFygac4vIiJCs996S6s//vis9z3/wgt6+LHH9OyIEYq/7jq9Pn263lq4UO07dPDvc29ysiZPmaJJ48era8eO+uLzz7XojBt7vJSWpp/efbfu/8UvdHuPHvpJ06aa95e/1MSnBwAhJaRDPCMzU9179tQTI0fqmHM65pyubN5cknTd9ddr4ZIlKiwqUs6+fXp9zhxdfvnl/mOXrlihP6Sn67dpaco7cECLli2TJI14+mmt27xZ+48e1Ve7diktI0PR0dGSpFt79NCM2bPVoEED//n++/nnJZ391HSzK67Q/HfeUWFRkfYeOaI58+eXu6L87+ef16effaYB992nLTk5Kvj2W81+6y3Vq1fvez/v5194Qf+7bZv+smDBWe8bcP/9+t2LL2rZ0qXKzcnR69Ona9mSJfrVM8/49xkxapQyZ87U3NmztW3rVv1q2DCVFBdr8Okbe8TExGjIww9r9KhR+vuKFdq0caOGPfigunTtqsTOnSv95wMACPEQ//qpp7Rm9WrNeu01tWzSRC2bNNGe3btVv359Lfnb3/T5Z5/p1oQE3XPHHWocG6u5Z4Rr0JAh8vl8+o+uXfXUsGGSpLKyMj37q18p4frr9eiQIepx22164be/lSStWb1av37qKR05csR/vmm///1Zc4WFhWnBokX68WWX6fYePXR3795q0bKl3pg/v9x+La6+Wnfdc4/+66679F933aVbe/TQM6NHn/dz7tGrl/r94hd6+oknKnx/ZFSUjh8/Xm7b8ZISdenWTdKpK/+b4uO14owbe6zIylKn0zf2uCk+XpGRkeX2+d+vvtKuvDx1PsfNPwAAFQv454gvJl6vVz6fTyXFxeVuVPHYk0/q888+07gxY/zbhj30kLbv2aNrWrXSju3bJUlfb9+usSkp5T5mxrRp/v/elZenCWPHatr06Xr6iSdUWloq75Ejcs6ddWOMf9crKUnX33CDrmvRQvl79kiSHhk8WBu2bFHHhARtXL9ekhQeHq7HHnhAR48elSS9NXeueiYlafzYsRV+3Msuu0wzZs/Ww/fdd86Xyy9ftkwjRo3SqpUrtfPrr9UrKUl9fv5zXXLJJZKky0/f2GP/GfPvLyxU6zZtJEmxTZroxIkTOnLkyFn7xJ7j5h8AgIqF9BXxudzQvr269+qlwqIi//ps2zZJp17k9J3PNmw469heSUlanJWl7Xv2aJ/Xq9fnzlXDhg31ox/9qNLnv7ZtW+3ZvdsfYUnatnWrDh8+rDZt2/q35eXm+iMsSfv27lWj87wg6pWZM7Vg3jytquB7w9/59VNP6evt2/XZtm361ufTH155RXMzM1VWVlbp+QEAF06tDHG9evW05N131aVDh3Lrhmuu0ScrV/r3Kz52rNxxVzZvrrffe09fbt6sgffeq27x8Rp1+ingyMjICz7nP0tLy73tnFN4+Ln/yHrcdpueevZZHSkt1ZHSUv3xT39SgwYNdKS0VIMffFDSqReN/bJfPzWKjlab5s11U5s2Onb0qHJ27pQkfXP6xh6Nz7ixR+PYWBWevrFH4b59ioqKUv369c+5DwCgckL6qWlJKvX5FH76adfvbNq4UX3vvVd5ubk6efJkpT/WTfHxCg8P1+hnnpFzp34h2c+Tk8vt4/P5/E/znstXW7eq2RVXKK5ZM/9VcZu2bfXjH/9YW7dsqfQ8Z7qtS5dy5/5Z374alZKipFtuUcEZ94E+ceKE9hYUKCIiQn3vvdf/wq7S0lJ9tmGDeiYl6b1FiySd+p52z6QkzXjlFUmnninw+XzqmZSkRadfKd2qdWtd2by51nLzDwAISMhfEefl5iqxc2dd2by5Lr/8coWFhWlGRoYuu+wyzX7rLXVMSFCLli31H//5n5o+a9Z5rzh37tihyMhIPT5ihK5q0UID7rtPQ0+/iOvfz+fxeNTzttt0+eWXV/iU9d+yspT9xRea9eab6nDTTYpPTNTMOXO08qOPKnw6vLK+2rZNW7Kz/Wtvfr7Kysq0JTtb3377rSQpoVMn9enXT1e1aKFbunXTovffV3h4uNJOv+BMktKnTNGDjzyiQYMH69o2bTTt1VdVNzpac0/f2MPr9eqNP/1Jk6dMUfeePdWhY0dNz8zUmtWr9Y+1a6s8PwDURiF/RTzt97/Xa2+8oQ1btqhu3bpqe9VV2pWXp6SuXTXxpZf01w8+UFRUlHbl5Snr/ffP+73SLzZvVsrTT2tUSorGT5qkVStX6vnUVL0+d65/n7WffqqZr76qN+bPV8OGDfX/xo3Ti+PHn/Wxkvv21R/S07Vs5UqVlZXpw/ff17MjRlTLY/DvLr30Uv3PCy+oRcuWOnr0qD5YskQP339/uRde/XnBAjVs1EhjJ0xQbJMm2rxpk+4548YeKU8/rbKyMr355z8rKipKWcuW6enhw6t9/otFjvUAFWjM/yMhxEVfpD89yU0fgGpAiIGaF0wh5qYPAABcJAgxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGCDEAAIYIMQAAhggxAACGIqwHAEJR47XWE1Sgk7OeAKhmYdYDVAlXxAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYCijEo0eP1rp16+T1elVYWKiFCxeqdevW1TUbAAAhL6AQ9+jRQxkZGbr55pvVu3dv1alTRx988IHq1q1bXfMBABDSIgLZ+c477yz39gMPPKADBw4oPj5eH3/88QUdDACA2iCgEJ+pfv36kqRDhw6dc5/IyEhFRUX53/Z4PD/klAAAhJQqv1grLCxMU6dO1SeffKLs7Oxz7peamiqv1+tf+fn5VT0lAAAhp8ohzsjIULt27fTLX/7yvPtNmjRJMTEx/hUXF1fVUwIAEHKq9NR0enq67rrrLnXv3v17r3B9Pp98Pl+VhgMAINQFHOL09HT169dPPXv2VG5ubjWMBABA7RFQiDMyMjRw4ED17dtXRUVFio2NlSQdOXJEx48fr5YBAQAIZQF9j3j48OFq0KCB/v73v2vfvn3+1b9//+qaDwCAkBbQFXFYWFh1zQEAQK3E75oGAMAQIQYAwBAhBgDAECEGAMAQIQYAwBAhBgDAECEGAMAQIQYAwBAhBgDAECEGAMAQIQYAwFCV7kcM4Pz2d7ae4GyN1/K74nEBdXLWE4QMrogBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwFGE9ABCKWlgPUJHO1gOgqo6ttZ4A1YkrYgAADBFiAAAMEWIAAAwRYgAADBFiAAAMEWIAAAwRYgAADBFiAAAMEWIAAAwRYgAADBFiAAAMEWIAAAwRYgAADBFiAAAM/aAQp6SkyDmntLS0CzUPAAC1SpVDnJCQoMcee0yff/75hZwHAIBapUohjo6O1ptvvqlHHnlEhw8fvtAzAQBQa1QpxBkZGVq8eLGWL1/+vftGRkbK4/GUWwAA4JSIQA/o37+/OnbsqMTExErtn5qaqnHjxgV6GgAAaoWAroibNWumadOmadCgQTpx4kSljpk0aZJiYmL8Ky4urkqDAgAQigK6Io6Pj1dsbKw2btz4rw8QEaHu3bvrySefVFRUlMrKysod4/P55PP5Lsy0AACEmIBCvHz5crVr167ctszMTG3btk0vvfTSWREGAADnF1CIjx49quzs7HLbjh07pm+++eas7QAA4Pvxm7UAADAU8Kumz9SrV68LMQcAALUSV8QAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYirAeAAhFx5yzHuFs68KsJwBQAa6IAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADAUcIibNm2quXPn6uDBgyouLtbmzZsVHx9fHbMBABDyArofcYMGDbRq1SqtWLFCd955pw4cOKBWrVrp8OHD1TUfAAAhLaAQp6SkaPfu3XrooYf823Jzcy/0TAAA1BoBPTXdp08frV+/XgsWLFBhYaE2btyooUOHnveYyMhIeTyecgsAAJwSUIhbtmypxx9/XNu3b9ftt9+uV199VS+//LIGDx58zmNSU1Pl9Xr9Kz8//wcPDQBAqAiT5Cq784kTJ7R+/Xp17drVv23atGlKTEzULbfcUuExkZGRioqK8r/t8XiUn5+vJjExKioqqvrkQBA75ir9z6rmrAuzngChpFPw/R2PDguev+Mej0f7vF7FVKJ1AV0R7927V1u2bCm3bevWrbryyivPeYzP51NRUVG5BQAATgkoxKtWrdK1115bblvr1q2Vl5d3QYcCAKC2CCjEaWlpuvnmm5Wamqqrr75aAwYM0KOPPqqMjIzqmg8AgJAWUIjXr1+vfv36acCAAfryyy/13HPPaeTIkZo3b151zQcAQEgL6OeIJWnx4sVavHhxdcwCAECtw++aBgDAECEGAMAQIQYAwBAhBgDAECEGAMAQIQYAwBAhBgDAECEGAMAQIQYAwBAhBgDAECEGAMBQwL9rGgAumCC8ubzWBc/N5f14nEIaV8QAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYIsQAABgixAAAGCLEAAAYirAeAEAtti7MeoKLA49TSOOKGAAAQ4QYAABDhBgAAEOEGAAAQ4QYAABDhBgAAEOEGAAAQ4QYAABDhBgAAEOEGAAAQ4QYAABDhBgAAEOEGAAAQ4QYAABDAYU4PDxcEyZM0M6dO1VcXKwdO3Zo7Nix1TUbAAAhL6D7EaekpOjxxx/XkCFDlJ2drYSEBGVmZurIkSNKT0+vrhkBAAhZAYX4lltu0aJFi7RkyRJJUl5engYMGKBOnTpVy3AAAIS6gJ6aXr16tZKSktSqVStJ0o033qhu3bpp6dKl5zwmMjJSHo+n3AIAAKcEdEU8efJkxcTEaNu2bTp58qQuueQSjRkzRvPmzTvnMampqRo3btwPnRMAgJAU0BVxcnKyBg0apIEDB6pjx44aMmSInn32WQ0ePPicx0yaNEkxMTH+FRcX94OHBgAgVAR0Rfy73/1OkydP1vz58yVJX375pZo3b67U1FTNmTOnwmN8Pp98Pt8PnxQAgBAU0BVx3bp1VVZWVm7byZMnFR7OjyMDAFAVAV0Rv/vuuxozZox27dql7Oxs3XTTTRo1apRmzZpVXfMBABDSAgrxiBEjNHHiRP3xj39U48aNVVBQoBkzZmjChAnVNR8AACEtTJKryRN6PB55vV41iYlRUVFRTZ4aqDHHXI3+s6qcdWHWEwDVKrqz9QT/4vF4tM/rVUwlWsc3dwEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwRIgBADBEiAEAMESIAQAwFNBNHwBUTnQYv9cZQOVwRQwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAoQirE9fzeKxODQBAtQqkcTUeYs/p4Xbk59f0qQEAqFEej0dFRUXn3SdMkquZcf6ladOm3zvY9/F4PMrPz1dcXNwP/lihjMepcnicKofHqXJ4nCon1B8nj8ejgoKC793P5KnpygxWWUVFRSH5B3ih8ThVDo9T5fA4VQ6PU+WE6uNU2c+JF2sBAGCIEAMAYOiiDfGJEyc0btw4nThxwnqUoMbjVDk8TpXD41Q5PE6Vw+N0ismLtQAAwCkX7RUxAAChgBADAGCIEAMAYIgQAwBg6KIN8fDhw5WTk6OSkhKtWbNGiYmJ1iMFldGjR2vdunXyer0qLCzUwoUL1bp1a+uxglpKSoqcc0pLS7MeJeg0bdpUc+fO1cGDB1VcXKzNmzcrPj7eeqygEh4ergkTJmjnzp0qLi7Wjh07NHbsWOuxzN16663661//qvz8fDnn1Ldv37P2GT9+vAoKClRcXKwPP/xQ11xzjcGkttzFtpKTk93x48fdAw884Nq2betmzJjhDh065Bo1amQ+W7CspUuXuiFDhrjrrrvO3Xjjje69995zubm5rm7duuazBeNKSEhwO3fudJs2bXJpaWnm8wTTatCggcvJyXGzZs1yiYmJ7qqrrnK9e/d2LVu2NJ8tmFZqaqo7cOCA++lPf+qaN2/u7r33Xuf1et2IESPMZ7Ncd9xxh5s4caK75557nHPO9e3bt9z7f/Ob37jDhw+7Pn36uBtuuMG988477uuvv3ZRUVHms9fgMh8g4LVmzRqXnp7ufzssLMzt2bPHpaSkmM8WrKthw4bOOeduvfVW81mCbUVHR7uvvvrKJSUluRUrVhDiM9akSZPcypUrzecI9vXuu++6119/vdy2t99+282dO9d8tmBZFYW4oKDAPfPMM/63Y2JiXElJievfv7/5vDW1LrqnpuvUqaP4+HhlZWX5tznnlJWVpS5duhhOFtzq168vSTp06JDxJMEnIyNDixcv1vLly61HCUp9+vTR+vXrtWDBAhUWFmrjxo0aOnSo9VhBZ/Xq1UpKSlKrVq0kSTfeeKO6deumpUuXGk8WvFq0aKGf/OQn5b6ee71erV27tlZ9PTe7H3FVNWzYUBERESosLCy3vbCwUG3atDGaKriFhYVp6tSp+uSTT5SdnW09TlDp37+/OnbsyGsMzqNly5Z6/PHHNWXKFL344otKTEzUyy+/LJ/Ppzlz5liPFzQmT56smJgYbdu2TSdPntQll1yiMWPGaN68edajBa0mTZpIUoVfz797X21w0YUYgcvIyFC7du3UrVs361GCSrNmzTRt2jT17t271v+KvfMJDw/X+vXrNWbMGEnSpk2b1K5dOw0bNowQ/5vk5GQNGjRIAwcOVHZ2tjp06KCpU6eqoKCAxwnfy/z58UBWnTp1XGlp6VnfZ5g9e7Z75513zOcLtpWenu527drlrrrqKvNZgm317dvXOedcaWmpfznn3MmTJ11paakLDw83nzEYVm5urps5c2a5bcOGDXN79uwxny2Y1q5du9zw4cPLbRszZozbunWr+WzBss78HnGLFi2cc861b9++3H4fffSRmzp1qvm8NbUuuu8Rl5aWasOGDUpKSvJvCwsLU1JSkj799FPDyYJPenq6+vXrp9tuu025ubnW4wSd5cuXq127durQoYN//eMf/9Cbb76pDh06qKyszHrEoLBq1Spde+215ba1bt1aeXl5RhMFp7p16571d+bkyZMKD7/ovszWmJycHO3du7fc13OPx6POnTvXuq/n5v83EOhKTk52JSUlbvDgwa5NmzZu+vTp7tChQ65x48bmswXLysjIcIcPH3bdu3d3sbGx/nXppZeazxbMi1dNn70SEhKcz+dzqamp7uqrr3YDBgxwR48edQMHDjSfLZhWZmam2717t//Hl+655x63f/9+N3nyZPPZLFd0dLRr3769a9++vXPOuZEjR7r27du7K664wkmnfnzp0KFD7u6773bt2rVzCxcu5MeXLpb1xBNPuNzcXHf8+HG3Zs0a16lTJ/OZgmmdy5AhQ8xnC+ZFiCteP/vZz9zmzZtdSUmJ27Jlixs6dKj5TMG26tWr59LS0lxubq4rLi52O3bscBMnTnR16tQxn81y9ejRo8KvRZmZmf59xo8f7/bu3etKSkrchx9+6Fq1amU+d00uboMIAIAhvnkBAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAIUIMAIAhQgwAgCFCDACAof8D/O92bO34CakAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# animate visited tracks    \n",
    "fig, ax = plt.subplots()\n",
    "image = plt.imshow(track.course, cmap='hot', interpolation='none')\n",
    "time_text = ax.text(0.05, 0.9, '', transform=ax.transAxes)\n",
    "\n",
    "def get_render_func(_track_maps_l):\n",
    "    def animate(it):\n",
    "        track_map = _track_maps_l[it]\n",
    "        #image.set_array(track.course)\n",
    "        image.set_array(track_map)\n",
    "        time_text.set_text(f\"Iteration {it}\")\n",
    "        return image, time_text\n",
    "    return animate\n",
    "\n",
    "def init():\n",
    "    image.set_array(track.course)\n",
    "    return [image]\n",
    "\n",
    "ani = animation.FuncAnimation(fig, get_render_func(track_maps_l), frames=range(0, len(track_maps_l), 100), \n",
    "                              interval=100, blit=True, init_func=init)\n",
    "ani.save(\"solution_2.gif\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-0861c8750a2997ae",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "![SegmentLocal](solution_2.gif \"segment\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-66a45f80f155ca39",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Use the code block directly below to test the resulting deterministic greedy policy (several samples are taken in order to show behavior in all different starting positions)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-ba1f0a2326526aeb",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "MovieWriter ffmpeg unavailable; using Pillow instead.\n"
     ]
    }
   ],
   "source": [
    "pos_maps_over_eps_l = []\n",
    "no_episodes = 10\n",
    "for e in range(no_episodes):\n",
    "\n",
    "    _, _, _, _, pos_map = gather_experience(pi, max_episode_len, deterministic=True)\n",
    "    pos_map = (pos_map > 0).astype(np.int16)\n",
    "    pos_map +=  track.course  # overlay track course\n",
    "    pos_maps_over_eps_l.append(pos_map)\n",
    "\n",
    "\n",
    "ani = animation.FuncAnimation(fig, get_render_func(pos_maps_over_eps_l),\n",
    "                              frames=range(0, len(pos_maps_over_eps_l), 1), \n",
    "                              interval=500, blit=True, init_func=init)\n",
    "ani.save(\"solution_2_2.gif\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-70e585406cef8528",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "![SegmentLocal](solution_2_2.gif \"segment\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-75a92b1a891b9346",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## 3) Extra Challenge: A More Complex Course"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-9eb7640363641603",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "The track given below poses a substantially harder challenge for Monte-Carlo based algorithms. Why? If you want to try solving it yourself, be aware that it may take much longer until a successful policy is found."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "from utils import build_rect_course"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-7fdf744535830e4d",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WWWWWWWWWWWW\n",
      "Woooo+W-oooW\n",
      "Woooo+W-oooW\n",
      "Woooo+W-oooW\n",
      "WooWWWWWWooW\n",
      "WooWWWWWWooW\n",
      "WooooooooooW\n",
      "WooooooooooW\n",
      "WooooooooooW\n",
      "WWWWWWWWWWWW\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeIAAAGdCAYAAADOnXC3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAUnElEQVR4nO3de2yV9f3A8U+RywKe6h8ijLp5Bd3CwHHxNi/bGjLdRSRLYGAixrhN2UyIW4YNLlFMBssSQUm3GRdxGk1mlujmmNmEuIsXdGiUiLqMSb0UrTKIrRYpw+/vD3+/7teJ2oO0n8Pp65V8/+jD8/B8eCDnzXPO6WlDRJQAAFIMyx4AAIYyIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBINHwjJNOmDAhurq6Mk4NAIOiUqnEtm3bPnS/QQ/xhAkTor29fbBPCwCDrqmp6UNjPOgh/r874ROamuJNd8UA1KFDK5XY0t7er2d/U56ajoh4s6vL09MADHnerAUAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAiYQYABIJMQAk2q8QL1q0KLZu3Rq7du2KDRs2xMyZMw/0XAAwJFQd4rlz58b1118f1157bUybNi2eeuqp+MMf/hBjx44diPkAoK5VHeIrr7wybr755rj11lvj2Wefjcsuuyy6u7vjkksuGYj5AKCuVRXiESNGxPTp02PdunW920opsW7dujj99NP3eczIkSOjUqn0WQDAu6oK8RFHHBHDhw+Pjo6OPts7Ojpi/Pjx+zympaUlOjs7e1d7e/v+TwsAdWbA3zW9fPnyaGxs7F1NTU0DfUoAOGgMr2bn7du3x7///e8YN25cn+3jxo2LV199dZ/H9PT0RE9Pz/5PCAB1rKo74j179sTjjz8ezc3NvdsaGhqiubk5HnnkkQM+HADUu6ruiCMirr/++vjlL38ZGzdujMceeywWL14cY8aMiTVr1gzEfABQ16oO8V133RVjx46NZcuWxfjx4+PJJ5+Mc889N1577bWBmA8A6lpDRJTBPGGlUonOzs4Y39gYXV1dg3lqABgUlUolXu3sjMZ+tM5nTQNAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJqv6hD/XqrUezJ6CunDKoH+HeL2MaGrJHeI+t2QPsw5EeCw5aY07NnmD/uCMGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAElUV4quuuioee+yx6OzsjI6Ojrj77rtj0qRJAzUbANS9qkJ8zjnnRGtra5x22mkxa9asGDFiRPzxj3+M0aNHD9R8AFDXhlez83nnndfn64svvjhef/31mD59evz1r389oIMBwFBQVYj/22GHHRYRETt27HjffUaOHBmjRo3q/bpSqXyUUwJAXdnvN2s1NDTEqlWr4sEHH4zNmze/734tLS3R2dnZu9rb2/f3lABQd/Y7xK2trTF58uT4xje+8YH7LV++PBobG3tXU1PT/p4SAOrOfj01vXr16vjqV78aZ5999ofe4fb09ERPT89+DQcA9a7qEK9evTrmzJkTn//856OtrW0ARgKAoaOqELe2tsaCBQti9uzZ0dXVFePGjYuIiDfeeCPefvvtARkQAOpZVa8RL1q0KA4//PD485//HK+++mrvmjdv3kDNBwB1rao74oaGhoGaAwCGJJ81DQCJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgETDswfg4DLm1OwJDhYN2QMcFI7NHmBfavDf+FuPZk/AQHJHDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASDRRwrxkiVLopQSK1euPFDzAMCQst8hnjFjRnz729+Op5566kDOAwBDyn6FeMyYMXHHHXfEN7/5zdi5c+eBngkAhoz9CnFra2usXbs21q9f/6H7jhw5MiqVSp8FALxreLUHzJs3L6ZNmxYzZ87s1/4tLS1xzTXXVHsaABgSqrojPuqoo+KGG26ICy+8MHbv3t2vY5YvXx6NjY29q6mpab8GBYB6VNUd8fTp02PcuHHxxBNP/Oc3GD48zj777Pjud78bo0aNinfeeafPMT09PdHT03NgpgWAOlNViNevXx+TJ0/us23NmjXx3HPPxY9//OP3RBgA+GBVhfjNN9+MzZs399n21ltvxb/+9a/3bAcAPpxP1gKARFW/a/q/feELXzgQcwDAkOSOGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImGZw/AweWtR7MnAKgv7ogBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJKo6xBMmTIjbb789tm/fHt3d3bFp06aYPn36QMwGAHWvqp9HfPjhh8dDDz0UDzzwQJx33nnx+uuvx8SJE2Pnzp0DNR8A1LWqQrxkyZJ46aWX4pJLLund1tbWdqBnAoAho6qnps8///zYuHFj3HXXXdHR0RFPPPFEXHrppR94zMiRI6NSqfRZAMC7qgrxcccdF5dffnn84x//iC996Uvxs5/9LG688ca46KKL3veYlpaW6Ozs7F3t7e0feWgAqBcNEVH6u/Pu3btj48aN8bnPfa532w033BAzZ86MM844Y5/HjBw5MkaNGtX7daVSifb29hjf2BhdXV37P/kB9taj2RMA8FGMOTV7gv+oVCrxamdnNPajdVXdEb/yyivxzDPP9Nn27LPPxic/+cn3Paanpye6urr6LADgXVWF+KGHHooTTzyxz7ZJkybFCy+8cECHAoChoqoQr1y5Mk477bRoaWmJ448/PubPnx/f+ta3orW1daDmA4C6VlWIN27cGHPmzIn58+fH008/HT/84Q9j8eLFceeddw7UfABQ16r6PuKIiLVr18batWsHYhYAGHJ81jQAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImEGAASCTEAJBJiAEgkxACQSIgBIJEQA0AiIQaAREIMAImqCvGwYcNi2bJl8fzzz0d3d3ds2bIlrr766oGaDQDq3vBqdl6yZElcfvnlsXDhwti8eXPMmDEj1qxZE2+88UasXr16oGYEgLpVVYjPOOOM+M1vfhO///3vIyLihRdeiPnz58cpp5wyIMMBQL2r6qnphx9+OJqbm2PixIkRETFlypQ488wz47777nvfY0aOHBmVSqXPAgDeVdUd8YoVK6KxsTGee+652Lt3bxxyyCGxdOnSuPPOO9/3mJaWlrjmmms+6pwAUJequiOeO3duXHjhhbFgwYKYNm1aLFy4ML7//e/HRRdd9L7HLF++PBobG3tXU1PTRx4aAOpFVXfEP/nJT2LFihXxq1/9KiIinn766Tj66KOjpaUlbrvttn0e09PTEz09PR99UgCoQ1XdEY8ePTreeeedPtv27t0bw4b5dmQA2B9V3RHfe++9sXTp0njxxRdj8+bN8dnPfjauvPLKuOWWWwZqPgCoa1WF+IorrojrrrsufvrTn8aRRx4Z27Zti5tuuimWLVs2UPMBQF1riIgymCesVCrR2dkZ4xsbo6urazBP/YHeejR7AgA+ijGnZk/wH5VKJV7t7IzGfrTOi7sAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAoqp+6EM9q6XPKAVg6HBHDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQSYgBIJMQAkEiIASCREANAIiEGgERCDACJhBgAEgkxACQannXiQyuVrFMDwICqpnGDHuLK/w63pb19sE8NAIOqUqlEV1fXB+7TEBFlcMb5jwkTJnzoYB+mUqlEe3t7NDU1feTfq565Tv3jOvWP69Q/rlP/1Pt1qlQqsW3btg/dL+Wp6f4M1l9dXV11+Rd4oLlO/eM69Y/r1D+uU//U63Xq75/Jm7UAIJEQA0CigzbEu3fvjmuuuSZ2796dPUpNc536x3XqH9epf1yn/nGd3pXyZi0A4F0H7R0xANQDIQaAREIMAImEGAASHbQhXrRoUWzdujV27doVGzZsiJkzZ2aPVFOuuuqqeOyxx6KzszM6Ojri7rvvjkmTJmWPVdOWLFkSpZRYuXJl9ig1Z8KECXH77bfH9u3bo7u7OzZt2hTTp0/PHqumDBs2LJYtWxbPP/98dHd3x5YtW+Lqq6/OHivdWWedFb/97W+jvb09Sikxe/bs9+xz7bXXxrZt26K7uzvuv//+OOGEExImzVUOtjV37tzy9ttvl4svvrh86lOfKjfddFPZsWNHGTt2bPpstbLuu+++snDhwvLpT3+6TJkypfzud78rbW1tZfTo0emz1eKaMWNGef7558uTTz5ZVq5cmT5PLa3DDz+8bN26tdxyyy1l5syZ5ZhjjimzZs0qxx13XPpstbRaWlrK66+/Xr785S+Xo48+unz9618vnZ2d5YorrkifLXOde+655brrrisXXHBBKaWU2bNn9/n1H/zgB2Xnzp3l/PPPL5/5zGfKPffcU/75z3+WUaNGpc8+iCt9gKrXhg0byurVq3u/bmhoKC+//HJZsmRJ+my1uo444ohSSilnnXVW+iy1tsaMGVP+/ve/l+bm5vLAAw8I8X+t5cuXl7/85S/pc9T6uvfee8svfvGLPtt+/etfl9tvvz19tlpZ+wrxtm3byve+973erxsbG8uuXbvKvHnz0ucdrHXQPTU9YsSImD59eqxbt653Wykl1q1bF6effnriZLXtsMMOi4iIHTt2JE9Se1pbW2Pt2rWxfv367FFq0vnnnx8bN26Mu+66Kzo6OuKJJ56ISy+9NHusmvPwww9Hc3NzTJw4MSIipkyZEmeeeWbcd999yZPVrmOPPTY+/vGP93k87+zsjEcffXRIPZ6n/Tzi/XXEEUfE8OHDo6Ojo8/2jo6OOOmkk5Kmqm0NDQ2xatWqePDBB2Pz5s3Z49SUefPmxbRp07zH4AMcd9xxcfnll8f1118fP/rRj2LmzJlx4403Rk9PT9x2223Z49WMFStWRGNjYzz33HOxd+/eOOSQQ2Lp0qVx5513Zo9Ws8aPHx8Rsc/H8//7taHgoAsx1WttbY3JkyfHmWeemT1KTTnqqKPihhtuiFmzZg35j9j7IMOGDYuNGzfG0qVLIyLiySefjMmTJ8dll10mxP/P3Llz48ILL4wFCxbE5s2b4+STT45Vq1bFtm3bXCc+VPrz49WsESNGlD179rzndYZbb7213HPPPenz1dpavXp1efHFF8sxxxyTPkutrdmzZ5dSStmzZ0/vKqWUvXv3lj179pRhw4alz1gLq62trdx88819tl122WXl5ZdfTp+tltaLL75YFi1a1Gfb0qVLy7PPPps+W62s/36N+Nhjjy2llDJ16tQ++/3pT38qq1atSp93sNZB9xrxnj174vHHH4/m5ubebQ0NDdHc3ByPPPJI4mS1Z/Xq1TFnzpz44he/GG1tbdnj1Jz169fH5MmT4+STT+5df/vb3+KOO+6Ik08+Od55553sEWvCQw89FCeeeGKfbZMmTYoXXnghaaLaNHr06Pf8m9m7d28MG3bQPcwOmq1bt8Yrr7zS5/G8UqnEqaeeOuQez9P/N1Dtmjt3btm1a1e56KKLykknnVR+/vOflx07dpQjjzwyfbZaWa2trWXnzp3l7LPPLuPGjetdH/vYx9Jnq+XlXdPvXTNmzCg9PT2lpaWlHH/88WX+/PnlzTffLAsWLEifrZbWmjVryksvvdT77UsXXHBBee2118qKFSvSZ8tcY8aMKVOnTi1Tp04tpZSyePHiMnXq1PKJT3yiRLz77Us7duwoX/va18rkyZPL3Xff7duXDpb1ne98p7S1tZW33367bNiwoZxyyinpM9XSej8LFy5Mn62WlxDve33lK18pmzZtKrt27SrPPPNMufTSS9NnqrV16KGHlpUrV5a2trbS3d1dtmzZUq677royYsSI9Nky1znnnLPPx6I1a9b07nPttdeWV155pezatavcf//9ZeLEielzD+byYxABIJEXLwAgkRADQCIhBoBEQgwAiYQYABIJMQAkEmIASCTEAJBIiAEgkRADQCIhBoBEQgwAif4HQIdmM39Bp7EAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Build the course\n",
    "_course_dim = (8, 10)\n",
    "_inner_wall_dim = (2, 6)\n",
    "    \n",
    "course = build_rect_course(_course_dim, _inner_wall_dim)\n",
    "track = RaceTrackEnv(course)\n",
    "for row in course:\n",
    "    print(row)\n",
    "    \n",
    "pos_map =  track.course  # overlay track course\n",
    "plot_pos_map(pos_map)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "cell-6382c23e5d25c036",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## 3) Solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-15500169957f16d3",
     "locked": false,
     "points": 0,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "source": [
    "Taking four turns to reach the goal is way harder than taking just two turns. Additionally, the state space is a lot larger now, which leads to much more exploration being necessary until all the states are properly evaluated. Although the course is more complicated, the problem description (\"reach the goal\") and the evironment physics (acceleration, momentum and collision) are still the same. Thus, there is no fundamental reason why Monte-Carlo should not be successful here, we just have to be aware that it will take some time.\n",
    "\n",
    "Fortunately, there are still upcoming lectures where more efficient learning algorithms could be discussed ;)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-9f170e15782def02",
     "locked": false,
     "points": 0,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "source": [
    "The following screenshot was taken after trying to solve this problem with the same algorithm as presented in task 2). As can be seen, the agent is actually able to solve the racetrack and reach the finish line. But it took about six hours on a very powerful computer to do so.\n",
    "\n",
    "![](FullCourse_MonteCarlo_Solved.png)\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Create Assignment",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
